NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Program Analysis for Adaptive Data Analysis

https://doi.org/10.1145/3656414

Liu, Jiawen; Qu, Weihao; Gaboardi, Marco; Garg, Deepak; Ullman, Jonathan (June 2024, Proceedings of the ACM on Programming Languages)

Data analyses are usually designed to identify some property of the population from which the data are drawn, generalizing beyond the specific data sample. For this reason, data analyses are often designed in a way that guarantees that they produce a low generalization error. That is, they are designed so that the result of a data analysis run on a sample data does not differ too much from the result one would achieve by running the analysis over the entire population. An adaptive data analysis can be seen as a process composed by multiple queries interrogating some data, where the choice of which query to run next may rely on the results of previous queries. The generalization error of each individual query/analysis can be controlled by using an array of well-established statistical techniques. However, when queries are arbitrarily composed, the different errors can propagate through the chain of different queries and bring to a high generalization error. To address this issue, data analysts are designing several techniques that not only guarantee bounds on the generalization errors of single queries, but that also guarantee bounds on the generalization error of the composed analyses. The choice of which of these techniques to use, often depends on the chain of queries that an adaptive data analysis can generate. In this work, we consider adaptive data analyses implemented as while-like programs and we design a program analysis which can help with identifying which technique to use to control their generalization errors. More specifically, we formalize the intuitive notion ofadaptivityas a quantitative property of programs. We do this because the adaptivity level of a data analysis is a key measure to choose the right technique. Based on this definition, we design a program analysis for soundly approximating this quantity. The program analysis generates a representation of the data analysis as a weighted dependency graph, where the weight is an upper bound on the number of times each variable can be reached, and uses a path search strategy to guarantee an upper bound on the adaptivity. We implement our program analysis and show that it can help to analyze the adaptivity of several concrete data analyses with different adaptivity structures.
more » « less
Full Text Available
Formalizing Algorithmic Bounds in the Query Model in EasyCrypt

Stoughton, Alley; Chen, Carol; Gaboardi, Marco; Qu, Weihao (August 2022, Leibniz international proceedings in informatics)

Full Text Available
Formalizing Algorithmic Bounds in the Query Model in EasyCrypt

Stoughton, Alley; Chen, Carol; Gaboardi, Marco; Qu, Weihao (August 2022, 13th International Conference on Interactive Theorem Proving (ITP 2022))
Andronick, June; de Moura, Leonardo (Ed.)
We use the EasyCrypt proof assistant to formalize the adversarial approach to proving lower bounds for computational problems in the query model. This is done using a lower bound game between an algorithm and adversary, in which the adversary answers the algorithm’s queries in a way that makes the algorithm issue at least the desired number of queries. A complementary upper bound game is used for proving upper bounds of algorithms; here the adversary incrementally and adaptively realizes an algorithm’s input. We prove a natural connection between the lower and upper bound games, and apply our framework to three computational problems, including searching in an ordered list and comparison-based sorting, giving evidence for the generality of our notion of algorithm and the usefulness of our framework.
more » « less
Full Text Available
Bidirectional type checking for relational properties

https://doi.org/10.1145/3314221.3314603

Çiçek, Ezgi; Qu, Weihao; Barthe, Gilles; Gaboardi, Marco; Garg, Deepak (June 2019, Programming Language Design and Implementation (PLDI))

Full Text Available
Relational cost analysis for functional-imperative programs

https://doi.org/10.1145/3341696

Qu, Weihao; Gaboardi, Marco; Garg, Deepak (July 2019, Proceedings of the ACM on Programming Languages)

Relational cost analysis aims at formally establishing bounds on the difference in the evaluation costs of two programs. As a particular case, one can also use relational cost analysis to establish bounds on the difference in the evaluation cost of the same program on two different inputs. One way to perform relational cost analysis is to use a relational type-and-effect system that supports reasoning about relations between two executions of two programs. Building on this basic idea, we present a type-and-effect system, called ARel, for reasoning about the relative cost of array-manipulating, higher-order functional-imperative programs. The key ingredient of our approach is a new lightweight type refinement discipline that we use to track relations (differences) between two mutable arrays. This discipline combined with Hoare-style triples built into the types allows us to express and establish precise relative costs of several interesting programs which imperatively update their data. We have implemented ARel using ideas from bidirectional type checking.
more » « less

Search for: All records